Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot

This cross-sectional study examines the readability and quality differences in cancer information found on free and paywalled versions of a chatbot.


Introduction
2][3] For example, patients with low socioeconomic status have poorer cancer care and outcomes than patients with higher income. 1 Health literacy may be key in addressing these disparities. 4,5Patients with higher health literacy are more likely to adhere to screening protocols for breast, cervical, and colorectal cancer than patients with lower health literacy. 3One study showed that patients with prostate cancer had an increased likelihood to regret their treatment(s) if they demonstrated lower health literacy. 6Low health literacy is especially concerning in environments where patients encounter inaccessible or inaccurate information, physician shortages, and language or cultural differences. 3For this reason, the American Medical Association recommends patient-facing health information to be written at a sixth grade or lower reading level. 7e emergence of consumer-facing large language models (LLMs) enables patients to now use chatbots to access health-related information.While the use of chatbots for seeking health-related information has become increasingly promising in recent years, 8 considering the downstream impact of these chatbots and their implications for health equity is important.Consumer health care information is often not individualized to patients' literacy levels. 9A recent study 10 found artificial intelligence (AI) outputs propagating biased health care information associated with gender, race, ethnicity, and socioeconomic status.Other studies [11][12][13] have shown AI chatbot responses to be presented above recommended reading levels for consumer health information, which could limit patients' abilities to understand information and take appropriate action for their care.
One paywalled chatbot requires a $20 per month subscription, presenting another accessibility barrier for low-income individuals who may benefit from the additional knowledge and reasoning capabilities provided by the paid version.Given the cost and performance differences between the free and paywalled versions of the chatbot, this study aimed to evaluate the quality and readability of the 2 versions' responses to common queries about cancer.We sought to determine if inequities exist in AI by investigating whether the paywalled chatbot provided easier-to-read responses compared with the free version and whether prompts could address any observed differences in readability and quality of cancer information.

Methods
This cross-sectional study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline. 14The study was determined exempt from review and the requirement of informed consent by the SUNY Downstate Health Sciences University institutional review board.
The 5 types of cancer with the highest projected mortality include lung, breast, prostate, colorectal, and skin cancers and were used as search terms in this study. 15The top 5 Google Trends queries associated with search terms lung cancer, breast cancer, prostate cancer, colorectal cancer, and skin cancer were used as chatbot inputs and were collected on December 1, 2023.The search tracker was used to identify publicly searched cancer terms and its settings were: US location, time Trained researchers (A.P. and D.M.), blinded to the chatbot model and input prompting, evaluated responses using the validated DISCERN questionnaire.DISCERN functions to rate the quality of health information for average health care consumers from a variety of information sources. 16The graders reported DISCERN scores as the mean of the 15 questions in the questionnaire rounded to the nearest whole number to represent the overall DISCERN score which was considered to be DISCERN's sixteenth question score.The rounded whole numbers were used to find the mean among their groups (chatbot version 3.5 prompted, chatbot version 3.5 nonprompted, chatbot version 4.0 prompted, and chatbot version 4.0 nonprompted) to get a total of 100 unique scores.
Interrater variability was calculated between evaluator DISCERN scores using a 2-way randomeffects intraclass correlation for agreement.

Statistical Analysis
To evaluate response readability, Flesch-Kincaid Reading Ease and Grade Level were calculated for each output (eTable 3 in Supplement 1). 17,18Values were compared pairwise between the free and paywalled chatbot, both prompted and unprompted response, using a

Discussion
To our knowledge, this is the first study comparing the readability and quality of cancer information produced by free vs paid chatbot-with and without prompts.Increased readability was associated with a prompt for the free version.This could have implications for reducing patients' access to reader-friendly information that could better empower them to gain knowledge and action steps for their own care and reduce potential disparities, such as adherence to cancer screening protocols or regret following treatment. 3,6is study also found comparable quality of consumer health information between free and paid versions of the chatbot with and without prompting.the categories involving references and treatment.To be more specific, the free and paid version of the chatbot did not cite their sources and treatment information lacked discussion of risks, benefits, and overall impact on quality of life.Fortunately, the chatbot scored high in subdomains specifying the aims of outputs and avoiding biases in its responses.
It is problematic that both the free and paid versions of the chatbot gave suboptimal readability scores (version 3.5: reading ease, 52.60; grade level, 11.6; version 4.0: reading ease, 62.48; grade level, 10.4) that exceed the recommended sixth grade reading level for consumer health information.
Indeed, we found a median 12th grade reading material in the free chatbot responses about cancer, which is consistent with data from a previous study suggesting little change in the chatbot's readability over time. 10A previous study found that the chatbot's production of information in the subject of orthopedic pathologies was significantly more difficult to read than similar information on health-related websites. 19The results of this study suggest that prompting may help to minimize the issue of reading difficulty.
Our study suggests that prompting the chatbot to respond at a 6th grade reading level was associated with increased reading ease in both the free and paid chatbot versions, yielding 8th grade to 10th grade level responses (Figure).Therefore, an extra step prompting the chatbot to respond  After prompting, the chatbot may have inappropriately interpreted its audience to be a sixth grader rather than providing a Flesch-Kincaid sixth-grade reading level response.For example, some prompted responses recommended discussing cancer issues with an adult or teacher in addition to a doctor, implying a flawed prompt interpretation.Despite prompting the chatbot to give responses at a sixth grade reading level, median responses were still at a higher grade level.These data are consistent with a previous study that prompted the chatbot to give information at a more readable level, but produced responses that were at the 10th grade level. 20Further investigation is warranted for identifying better chatbot prompts to improve information quality, increase readability and how to prompt chatbots to better understand human requests.

Limitations
This study has limitations.We limited the study to using top search queries according to the search tracker's data as opposed to the top searched queries submitted by the public to the paid and unpaid chatbot software because those data were not available to the public.Also, chatbot responses were not restricted to a set length.Stochasticity in response lengths can be a potential confounder for readability and information quality.Finally, our study did not address accuracy in prompted and nonprompted chatbot responses.A previous study 21 found that chatbot produced accurate responses to 11 of 13 directed questions on common myths about cancer.

Conclusions
These findings suggest that chatbots have the potential to contribute to cancer health inequities when responses are presented above the recommended sixth grade reading level for consumer health information, particularly when the general public is seeking cancer information with the unpaid version of a chatbot.However, prompting the chatbot to respond to common queries about cancer at a sixth grade reading level provides an associated increase in readability with no significant change in health information quality.This presents an opportunity for physicians to coach their patients on how to search for information on the chatbot at a more readable level.

Figure .
Figure.Readability and Quality Metrics of GPT−Generated Text 2-sided paired t test with Benjamini-Hochberg adjustment for multiple comparisons for a total of 6 unique t tests.P < .05 was considered significant.Data analyses were done in R version 4.3.2(R Project for Statistical Computing).Analyses were performed from December 20, 2023, to January 15, 2024.
Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot The association of increased readability with the chatbot's ability to maintain consistent overall DISCERN scores (eg, median scores ranged from 3.6 to 3.8) is promising for the text simplification capabilities of LLMs.Simplifying text did not reduce the quality of health information in this study.The lowest subdomain DISCERN scores were in JAMA Network Open | Health Informatics JAMA Network Open.2024;7(7):e2422275.doi:10.1001/jamanetworkopen.2024.22275(Reprinted) July 26, 2024 3/7 Downloaded from jamanetwork.comby guest on 07/28/2024 Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot Downloaded from jamanetwork.comby guest on 07/28/2024 at a lower grade reading level can make health information more readable and further accessible for laypeople.